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Abstract 

Background:  The  success  of  clinical  information  systems  depends  upon  their 
effective  integration  into  complex  work  systems  involving  distributed 
responsibility  and  decisionmaking.  Human-computer  interaction  (HCI) 
deficiencies  and  mismatches  between  systems  design  and  the  structure  of  work 
create  the  potential  for  new  paths  to  system  failures  (e.g.,  allergy  lists  not  directly 
visible  on  a  screen).  The  use  of  human  factors  methods  is  widespread  in  other 
industries  and  can  predict  some  of  these  new  failure  paths,  facilitating  redesign  to 
prevent  accidents-in-the-making.  This  paper  will  discuss  the  application  of 
scenario-based  usability  testing  in  clinical  health  care  settings.  Methods:  Using 
scenario-based  usability  testing  methods,  we  investigated  point-of-care  software 
technology  (e.g.,  barcoded  medication  administration  [BCMA]  and  wireless 
medication  administration  [WMA])  in  an  attempt  to  better  understand  the  safety 
implications  of  HCI  design  decisions.  The  use  of  scenarios  in  usability  testing 
focuses  attention  on  specific  aspects  of  the  interface  to  identify  pitfalls  and  system 
failures.  The  scenarios  were  developed  after  extensive  ethnographic  observation 
of  the  medical  work  with  bar-coding  software  and  the  computerized  order  entry 
system  (COES).  Results:  The  paper  lays  out  the  methodology  of  scenario-based 
usability  testing  for  use  in  health  care.  We  were  able  to  identify  new  paths  to 
failures  using  this  method  and  recommended  the  software  to  simplify  and  support 
the  user’s  tasks.  Scenario-based  testing  also  identified  workplace  performance 
trade-offs  related  to  time  and  production  pressures.  Conclusion:  Scenario-based 
usability  testing  is  an  important  methodology  that  characterizes  how  human- 
software  interaction  contributes  to  success  or  failure  in  clinical  system 
implementations.  Usability  testing  can  identify  and  promote  data-driven  design 
choices  culled  from  practitioner  use  of  the  system  in  a  busy  work  environment. 
Human  factors  knowledge  of  HCI  design  and  its  impact  on  human  performance 
can  advance  safety  in  health  care. 


Introduction 

In  health  care,  as  in  other  domains,  the  expectations  surrounding  new  and  as- 
yet-unproven  technologies  often  are  far  more  optimistic  than  is  reasonable.  These 
new  technologies  often  are  sold  on  the  basis  of  their  presumed  positive  human 
performance  impacts.  For  example,  clinical  information  systems  have  been 
advocated  to  reduce  the  risk  of  adverse  drug  events  at  each  stage  in  the 
medication  administration  process.'  These  systems,  including  computerized 
physician  order  entry  (CPOE),  automated  dispensing  systems,  and  barcode 
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technology,  achieve  this  new  level  of  “safety”  through  reduced  reliance  on 
memory,  increased  access  to  information,  and  increased  compliance  with  “best 
practice”  procedures.^ 

But  in  addition  to  providing  new  capabilities,  new  technologies  also  impact 
the  technical,  social,  organizational,  economic,  cultural,  and  political  dimensions 
of  work  in  new  and  different  ways.  Observations  of  new  technology 
implementations  have  shown  that  a  change  in  technology  literally  alters  roles, 
strategies,  and  paths  to  failure."^  In  recognizing  this,  the  Institute  of  Medicine 
report.  To  Err  Is  Human,  recommends  examining  new  technologies  for  “threats  to 
safety  and  redesign(ing)  them  before  accidents  occur.”^  In  order  to  minimize 
harm,  we  propose  to  anticipate  the  side  effects  of  introducing  clinical  information 
systems  in  work  practice,  using  proactive  testing  methods. 

This  paper  describes  a  method  of  scenario-based  usability  testing  and  its 
usefulness  in  identifying  negative,  unanticipated  side  effects  in  a  clinical 
information  system.  The  main  advantage  of  this  approach  is  its  ability  to  identity 
impact  prior  to  implementation  and  to  suggest  redesign  before  adverse  events  or 
injury  to  patients  can  occur.  The  testing  results  reveal  unintended  side  effects 
from  design  decisions  based  on  oversimplified  models  of  the  work.  Analysts  can 
use  the  observations  to  suggest  critical  elements  of  work  processes  for  maximal  or 
“besf  ’  performance  with  the  information  system,  while  generating  ideas  for 
system  redesign  in  the  long  term. 


Background 

Usability  testing 

Software  applications  in  the  computer  industry  routinely  undergo  some  type 
of  formal  usability  testing.  This  evaluation  proves  important,  particularly  in 
complex  sociotechnical  systems  where  work  is  distributed  across  time  and  space, 
and  multiple  tasks  continuously  compete  for  the  attention  of  the  worker.  Other 
high-consequence  industries,  such  as  nuclear  power  and  aviation,  are  similar  to 
health  care  in  terms  of  their  safety  standards  and  the  need  to  maintain  a  high  level 
of  reliability.  In  each  of  the  fields,  the  role  and  impact  of  the  information  system 
is  heightened  because  of  the  immediate  effect  on  human  lives.  A  difficult-to-use 
interface  in  clinical  settings  not  only  will  impact  profit  and  productivity,  but  also 
patient  safety.  Design  of  clinical  information  systems  should  ideally  simplify 
work  processes,  resulting  in  improved  efficiency  and  increased  safety.  Given 
finite  time  constraints,  the  most  important  relationship  of  efficiency  to  safety 
becomes  obvious.  If  performance  and  individual  tasks  are  slowed,  then  less  time 
or  attention  is  available  for  the  work  tasks,  promoting  mental  slips  and  predictable 
human  adaptations  to  workload  (e.g.,  shed  tasks,  decreased  performance  criteria, 

6  7 

and  differing  tasks — all  generally  described  as  “cutting  comers”).  ’ 

The  traditional  usability  test  involves  observations  of  workers  completing 

o 

tasks  with  the  use  of  the  computer  interface.  Usability  as  a  constmct  has  multiple 
components:  learnability,  efficiency  of  use,  ease  of  recall,  low  error  generation. 
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and  subjective  pleasure.^  Health  care  brings  its  own  constraints  to  tool  design.  The 
distributed  nature  of  the  work  requires  accurate  access  to  information.  The  pace 
and  tradition  in  health  care  limits  access  to  consistent  training,  which  in  turn 
increases  the  importance  of  learnability  and  ease  of  use. 

The  usability  test  can  be  a  conducted  using  a  variety  of  methodologies  (e.g., 
thinking-aloud,  constructive  interaction,  retrospective  testing,  and  coaching).  The 
most  popular  method  is  “thinking-aloud,”^  in  which  users  verbalize  their  thoughts 
while  using  the  device  interface.  The  process  of  thinking  aloud  allows  analysts  to 
better  understand  the  mental  model  employed  by  the  users,  as  well  as  the 
particular  aspects  of  the  interface  that  cause  the  most  problems.  The  literature 
regarding  human-computer  interaction  suggests  that  usability  testing  by  three  to 
five  users  appears  to  find  about  85  percent  of  major  interface  usability  problems.^*’ 
Identification  of  serious  usability  problems  in  advance  of  the  software  release 
improves  performance  and  acceptance  of  the  software.  Most  usability  tests  are 
videotaped  to  permit  analysis  of  statements  of  confusion  and  errors  in  using  the 
system. 

Scenario-based  testing 

Usability  testing  embedded  in  a  scenario  allows  the  simulation  to  be  grounded 
in  the  observation  of  the  work  practice  context.  Most  interface  testing  is  designed 
to  complete  the  simple,  straightforward  task.  Difficulty  in  design  decisions  can 
more  easily  be  created  when  “typical”  work  with  its  time  pressure,  competition 
for  attention,  and  interruptions.  Grounding  the  testing  in  the  work  is  necessary, 
because  complexity  reveals  latent  software  problems  of  the  sort  that  simple, 
straightforward  repetition  often  does  not  reveal.  The  design  of  a  scenario 
replicates  the  use  of  a  system,  the  user’s  interaction  with  it,  and  the  performance 
of  an  activity  over  a  specified  period  of  time.  These  testing  methods  provide 
opportunities  for  learning  how  the  system  actually  functions  and  malfunctions, 
through  demonstrations  of  how  practitioners  accommodate  and  adapt  to  the 
technology  change,  without  causing  patients  actual  harm.  The  clinical  information 
system  also  is  observed  in  testing  to  determine  how  the  technology  transforms 
roles,  coordination,  and  the  means  by  which  people  adapt  to  the  mix  of  new 
capabilities  and  complexities.  This  information  helps  to  reveal  the  organizational, 
design,  and  training  adjustments  necessary  to  make  the  system  more  useful,  while 
reducing  unintended  side  effects  related  to  the  change. 

Ethnographic  observations  and  structured 
interviews  associated  with  scenario  design 

In  an  attempt  to  develop  an  accurate  and  representative  scenario  of  the  work 
practice,  an  intimate  picture  of  how  the  work  is  accomplished  is  created  using 

12 

ethnographic  observations  from  trained  observers.  Ethnographic  observations 
and  structured  interviews  are  conducted  in  the  workplace  prior  to  the  scenario 
design  activities  to  better  understand  key  aspects  of  work,  particularly  those  areas 
involving  communication,  collaboration,  expertise,  in-place  safeguards, 
competing  tasks,  interruptions,  etc.  The  observations  and  interviews  facilitate 
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development  of  a  predieted-use  model  and  its  positioning  in  a  workplaee  context. 
This  framework  then  is  used  to  develop  scenario-based  usability  tests  modeled  on 
the  interface  features  in  a  typical  sequence  of  events  and  targeted  situations,  to 
test  issues  identified  during  the  observations.  The  analyst  then  can  use  the 
scenario  design  to  predict  use  and  sources  of  difficulty. 

To  conduct  the  aforementioned  ethnographic  observations,  trained  observers 
captured  detailed  data,  including  (1)  observable  activities  and  verbalizations,  and 
(2)  subject-reported  data  about  how  artifacts  (tools)  support  performance.^^  The 
observer  also  captured  the  sequence  of  events  as  well  as  other  details  of  the 
communication,  interactions,  and  teamwork  of  the  clinician  user.  Because  the 
information  was  gathered  prospectively,  the  data  quality  was  high  and  was  judged 
to  be  representative  of  “typical”  behavior,  as  opposed  to  retrospective  or 
generalized  subject-reported  behavior  obtained  through  an  interview.  The  risk  of 
behavior  modification  associated  with  the  observation  process  itself  was 
minimized  through  the  use  of  a  pilot  phase,  prior  to  data  collection,  and  by  asking 
the  trusted  practitioners  to  judge  whether  or  not  they  acted  in  a  typical  fashion. 
The  results  of  the  analysis  provided  the  areas  of  concentration  necessary  for 
development  of  the  scenarios  and  the  usability  testing.  The  data  collected  from 
multiple  workers  then  was  studied  for  themes,  patterns,  strategies,  and  tools  used 
to  complete  the  task.  Reliability  and  validity  was  derived  using  the  triangulation 
of  findings  from  multiple  individuals  and  multiple  data  sources. 

Structured  interviews  were  used  in  conjunction  with  (or  in  place  of) 
ethnographic  observations  during  data  collection  to  get  a  more  complete 
understanding  of  work  processes.  When  researchers  observe  ambiguous  or 
complex  actions,  it  is  important  to  conduct  such  questioning  to  elucidate  their 
meaning.  Data  collection  through  structured  interviews  is  replicable  in  that  the 
same  questions  are  asked  with  the  same  words  in  every  interview.  Although  it  is 
believed  that  some  aspects  of  expertise — particularly  processes  involving  physical 
movement — are  incapable  of  being  self-reported,  it  is  considered  valid  to  use  self- 
reporting  techniques  to  elicit  “textbook”  knowledge  such  as  the  typical  workflow 
of  surgical  procedures.  Data  analysis  involves  compiling  transcribed  responses 
from  de-identified  interviewees  for  each  question,  then  synthesizing  those  views 
central  to  the  majority  of  the  participants  and  characterizing  the  variability  in 
perspectives  among  those  interviewed. 


Methods 

The  entire  scenario-based  usability  testing  process  is  composed  of  four  major 
steps:  (1)  data  collection  for  the  work  to  be  studied  (e.g.,  ethnographic 
observations,  structured  interviews),  (2)  scenario  development,  (3)  scenario-based 
user  testing,  and  (4)  data  analysis.  In  this  section  we  describe  each  step,  followed 
by  a  case  study. 
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Setting 

The  Veterans  Health  Administration  (VHA),  one  of  the  largest  health  eare 
systems  in  the  United  States,  is  a  leader  in  the  use  of  medieal  informaties  systems. 
In  1997,  the  VHA  implemented  eomputerized  patient  reeord  system  (CPRS),^^ 
whieh  is  integrated  with  the  Veterans  Health  Information  Systems  and 
Teehnology  (VistA)  database.'"^  The  VistA  database  is  a  eolleetion  of  tools  that 
permit  interfaeility  networking,  data  sharing,  and  speeialized  eentral  support.  A 
graphieal  user  interfaee  (GUI)  for  CPRS  was  later  developed  and  implemented  to 
replaee  the  original  eommand  line  interfaee.  In  2000,  the  VHA  implemented  the 
bareoded  medieation  administration  (BCMA)  system,  whieh  uses  seanned 
bareodes  to  ensure  that  eaeh  patient  gets  the  correet  medieation  in  the  eorreet  dose 
and  route,  at  the  eorreet  time.  Figure  1  is  a  list  of  medieations  for  one  partieular 
patient  and  the  order  in  whieh  they  are  to  be  given — the  “due  list.”  BCMA  has 
been  deployed  in  all  VHA  faeilities  aeross  the  United  States. 


Figure  1.  A  virtual  BCMA  due  list  (version  2.0) 
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A  eorporation  outside  of  the  VHA  introdueed  the  use  of  wireless  medieation 
assistants  (WMAs)  with  built-in  bareode  seanners  to  support  medieation 
administration.  Both  the  patient’s  hospital  wristband  and  his  or  her  medieation 
labels  are  bareoded  for  the  patient’s  safety.  The  WMA  applieation  was  developed 
to  emulate  an  existing  BCMA  desktop  applieation.  The  WMA  software  is  loaded 
on  a  personal  digital  assistant  (PDA),  speoifieally  the  Symbol  Teehnologies® 
Model  PPT  2800,  with  a  206  MHz  proeessor,  running  the  Mierosoft®  Poeket  PC 
2002  operating  system.  The  WMA  system  was  used  briefly  by  a  small  number  of 
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VHA  facilities  prior  to  the  start  of  the  evaluation  proeess,  but  a  moratorium  was 
placed  on  the  use  of  the  system  in  the  field  until  the  evaluation  was  eompleted. 
The  system  has  sinee  been  eleared  for  use. 

Step  1 — Ethnographic  observations 

The  seenarios  used  for  testing  are  designed  using  problems  derived  from 
observations  of  the  work  environment  in  whieh  the  software  was  to  be  used. 

BCMA  example 

Nurses  aeeessed  the  BCMA  software  using  a  laptop  eomputer  fixed  to  a 
wheeled  medieation  eart  and  linked  to  the  VHA’s  eleetronic  databases  via  a 
wireless  network.  The  nurse  seanned  the  bareode  on  the  patient’s  wristband  to 
seleet  that  individual’s  medieation  regimen  from  the  database  and  present  it  on  the 
eomputer’ s  display.  Eaeh  medieation  eontainer  barcode  then  was  scanned  to 
verify  that  the  medication,  dose,  route,  and  administration  time  match  what  was 
ordered  by  the  patient’s  physieian.  If  the  drug  formulation  information  assoeiated 
with  the  bareode  matehed  the  displayed  database  information,  the  system  then 
noted  the  medieation  was  administered  by  the  nurse  at  the  time  the  wristband  was 
seanned.  If  the  scanned  information  assoeiated  with  the  medication  did  not  match 
the  patient’s  medieation  orders,  a  pop-up  dialog  box  appeared  on  the  laptop 
eomputer  sereen  to  alert  the  nurse  to  the  discrepaney. 

Medieation  administration  was  observed  in  the  aeute  eare  and  nursing  home 
wards  of  three  VHA  hospitals.  Three  observers  trained  to  perform  ethnographie 
observations  in  complex  settings  eonducted  all  observations.  To  minimize  the 
effect  of  the  observations  on  the  behavior  of  the  study  partieipants,  no  data  that 
eould  identify  person,  plaee,  or  time  of  day  was  eolleeted,  nor  was  any 
demographie  information  or  medieation  error  rate  information  reeorded.  We 
observed  nurses  using  BCMA  equipment  at  one  small,  one  medium,  and  one  large 
hospital  facility  for  periods  ranging  from  24  to  3 1  hours,  for  a  total  of  79  hours. 
Patterson  and  eolleagues'^  reported  the  observations  speeifics  and  their 
eonelusions  in  a  prior  publieation.  These  observations  were  used  to  direet  the 
development  of  seenarios  for  use  in  the  testing  of  BCMA. 

Unlike  BCMA,  ethnographie  observations  of  hospital  staff  using  the  WMA 
system  were  not  possible,  given  that  it  was  not  in  use  in  any  of  the  studied 
faeilities.  To  replieate  the  data  retrieved  from  this  proeess,  structured  interviews 
were  instead  used  to  assist  in  the  development  of  a  predieted  use  model  and  to 
identify  potential  sourees  of  error.  One  researeher  eondueted  struetured  interviews 
with  nurses  at  two  hospitals  who  had  used  the  WMA  system  briefly. 

Step  2 — Scenario  development 

The  first  step  in  developing  a  seenario  is  identifying  the  most  problematie 
areas  of  work  (e.g.,  as  a  result  of  the  analysis  of  the  observations)  and  ereating 
probes  of  speeifie  elements  in  the  medieation  administration  proeess  that  will 
require  the  use  of  deeisionmaking  principles  in  eontext.  In  addition  to 


370 


Usability  Testing  and  Software  Systems 


observations,  a  good  source  of  probes  is  the  stories  related  by  workers  about  the 
difficulties  of  the  system  implementation.  Probes  are  best  developed  through  a 
partnership  involving  human  performance  expertise  and  clinical  expertise. 

The  next  step  in  the  scenario  development  process  is  identifying  constraints  in 
terms  of  the  work  volume,  time  frame,  task  complexity,  and  contextual  factors.  To 
the  extent  possible,  complete  cases  are  constructed  with  laboratory  values, 
radiology  results,  progress  notes,  prior  medication  records,  and  discharge 
summaries.  This  initial  effort  creates  a  catalogue  of  cases  with  which  additional 
testing  for  difficult  functionality  is  possible.  Access  to  electronic  medical  records 
in  a  de-identified  test  account,  as  well  as  an  extensive  clinical  expert,  facilitates 
the  process. 

BCMA  example 

The  scenario  used  to  test  BCMA  took  the  form  of  a  “shift  change”  report  and 
involved  administering  medication  to  a  number  of  simulated  patients  with 
barcoded  wristbands.  To  imitate  the  usual/minimal  amount  of  patient-specific 
knowledge,  part  of  the  testing  involved  listening  to  a  taped  “shift  change”  report. 
We  created  a  model  of  a  busy  9  a.m.  medication  distribution  pass,  usually  with 
four  to  six  different  patients.  The  test  subject  (nurse)  then  chose  the  order  in 
which  the  patients  would  be  medicated  and  began  the  exercise.  Simulated 
barcodes  for  patients,  medications,  and  a  medication  cart  were  provided  to 
support  the  testing.  Because  interruptions  are  common  in  nursing  work,  some 
were  also  built  into  the  scenario.  As  an  example,  here  is  the  transcript  of  the  shift 
change  report  for  one  of  the  hypothetical  test  patients: 

Mr.  A  is  an  elderly  60  kg  BM  with  a  past  medical  history 
significant  for  severe  COPD  FEVl  300  cc.,  and  CHF  with  EF  of 
20%  who  presents  to  ED  with  2  days  of  increased  shortness  of 
breath  associated  with  green  sputum,  PND  and  orthopnea,  and 
increased  LE  swelling.  Fie  was  admitted  last  night  to  the  ICU.  He 
is  visibly  anxious  with  a  RR  of  40  and  ABG  7.31/55/55  on  35% 
vend  mask.  CXR  remarkable  for  infiltrate  in  both  bases  thought 
secondary  to  congestive  heart  failure.  BP  is  110/60,  P  120,  T 100, 

02  sat  88%  on  40%.  The  patient  has  one  peripheral  IV  and  a  triple 
lumen  in  his  right  subclavian.  In  the  triple  lumen,  one  port  has 
theophylline,  another  port  is  for  the  IV  meds. 

Sample  interruptions  in  the  exercise  included  the  following: 

•  Ringing  telephone  (the  nurse  answered  it  and  heard  the  following 
reply):  “This  is  the  lab,  can  I  talk  to  the  nurse  caring  for  Mr.  Smith? 

Hi,  we  have  two  critical  results  on  Mr.  Smith,  XXX-XX-XXXX.  In 
the  arterial  blood  gas,  pH  of  7.29,  PaOi  of  60,  and  a  PaCOi  of  60,  and 
the  potassium  is  2.9.” 

•  At  one  point  a  nurse  manager  asked  the  test  subject,  “Can  you  work  an 
extra  shift?” 
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•  An  alarm  for  bradycardia  was  triggered  in  another  room. 

•  The  wife  of  “Patient  #1”  told  the  nurse,  “He  wants  a  drink  of  water!” 

•  An  individual  imitating  a  physieian  approaehed  and  asked  the  nurse, 
“Are  you  the  nurse  for  Mr.  X?  Lab  just  ealled  me. . .  Mr.  X’s 
Creatinine  is  2.6,  how  mueh  digoxin  has  he  gotten?” 

WMA  example 

Seenario  development  for  WMA  testing  was  adjusted  from  the  BCMA 
example  to  better  explore  how  the  small  sereen  on  the  PDA  affeeted  aeeess  to 
important  data  (e.g.,  patient  identifiers,  allergy  information,  medieation 
administration  history,  etc.).  The  following  is  an  exeerpt  from  the  shift  ehange 
report  transeript  for  one  of  the  hypothetieal  test  patients: 

Room  45  -  Bed  2,  Mr.  X  is  a  90-kg  BM  with  a  past  medical  history 
significant  for  diabetes.  Was  admitted  to  the  VA  4  times  in  this 
year,  underwent  Fem-pop  bypass  left.  Has  a  left  second  and  third 
toe  amputation  and  has  a  decub  ulcer  too.  He  was  admitted  last 
night  to  the  ward.  BP  is  132/60,  P  98,  T  97.  The  patient  has  one 
peripheral  IV  and  a  triple  lumen  in  his  right  subclavian.  In  the 
triple  lumen,  one  port  has  theophylline,  another  port  is  for  the  IV 
meds. 

This  seenario  is  slightly  different  than  the  BCMA  example  in  that  it  was 
designed  to  test  the  impaet  of  the  WMA  sereen  design  on  the  display  of  patient 
identifiers  and  medieation  administration  history. 

Step  3 — User  testing 

As  outlined  in  the  baekground  seetion,  usability  testing  is  a  method  for 
examining  the  interaetion  between  the  user  and  the  eomputer  interfaee.  Part  of  the 
testing  plan  ineludes  determining  the  appropriate  number  of  praetitioners  to  be 
tested  and  the  level  of  expertise  needed  in  the  test  subjeets.  Generally,  tests 
inelude  both  noviee  and  expert  users.  Subjeets  are  solieited  from  the  hospital  staff 
through  advertisements  (posters,  ete.)  displayed  in  relevant  work  units.  To  prevent 
a  potential  eonfliet  of  interest,  the  testing  must  be  seheduled  for  a  time  when  the 
partieipants  were  not  being  paid  by  the  hospital.  The  subjeet  seleetion  is 
determined  by  the  order  in  whieh  they  volunteered,  and  the  advertisements  should 
state  the  duration  of  the  tests,  the  expeetations,  the  faet  that  video  and  audio 
taping  may  be  involved,  and  the  speeifie  funetionalities  to  be  tested.  Sueh  testing 
often  must  be  eleared  by  union  and  hospital  leadership. 

A  pilot  test  often  is  eondueted,  due  to  the  eomplexity  of  the  information 
system  set  up  and  the  elinieal  topies  eovered.  The  pilot  test  verifies  that  the  eases 
are  available  in  the  dataset,  the  bareodes  are  eorreet,  and  the  interfaee  is  operable 
from  the  testing  loeation,  while  at  the  same  time  helping  those  running  the  test  to 
ereate  a  smooth,  eonsistent  testing  environment.  Pilot  subjeets  are  usually  tested 


372 


Usability  Testing  and  Software  Systems 


at  least  24  hours  prior  to  the  aetual  tests,  and  the  data  eollected  during  the  pilot 
testing  is  not  usually  ineluded  in  the  data  analysis. 

We  have  found  that  having  two  testers  present  during  the  exercise  is  helpful — 
one  to  interact  with  the  nurse  subject  (i.e.,  to  introduce  the  interruptions),  while 
the  other  collects  data  (e.g.,  taking  notes  and  reminding  the  test  subject  to 
verbalize  their  thoughts  as  they  work  their  way  through  the  exercise).  A  debrief 
interview  is  conducted  following  the  test  to  obtain  further  information  from  the 
test  subject  on  their  opinion  of  the  interface  and  the  confusion  and  difficulties 
they  experienced  with  the  software.  An  industrywide  usability  questionnaire^^ 
allows  the  test  subject  to  articulate  their  level  of  satisfaction  with  the  interface  and 
its  impact  on  their  work.  The  actual  testing  process  includes — 

•  Written  introduction  to  the  testing  process 

•  Introduction  to  the  scenario,  e.g.,  listening  to  the  “shift  change  report” 

•  Complete  simulation,  e.g.,  pass  barcoded  medications  to  simulated 
patients  with  barcoded  wristbands,  frequent  interruption  by  phone  and 
in  person  during  medication  pass,  verbalizing  what  they  were  thinking 
during  the  test — “thinking  aloud” 

•  Debrief  interview  following  test  and  satisfaction  questionnaire 

Great  care  should  be  taken  to  ensure  that  all  testing  is  done  in  a  well-identified 
test  account,  so  that  the  data  can  be  manipulated  without  affecting  that  of  real 
hospital  patients.  Data  in  the  test  account  must  be  representative  of  the  data  that  is 
available  to  practitioners  in  the  live  patient  account  for  the  simulation  to  be 
successful. 

BCMA  example 

Usability  testing  for  Version  1  of  BCMA  involved  five  nurse  subjects,  each 
participating  for  90-120  minutes.  Additional  testing  was  done  on  subsequent 
versions  with  the  same  number  of  participants.  Each  test  involved  an  identical 
agenda: 

•  Each  study  participant  was  scheduled  and  paid  for  two  hours  of  their 
time. 

•  The  purpose  of  the  study  was  described  to  each  participant  during  the 
testing  session,  informed  consent  forms  for  video  and  audio  taping 
were  signed,  the  participants  practiced  “thinking  aloud”  with  standard 
practice  examples  (e.g.,  multiply  24  times  34),  after  which  a  taped 
“shift  change”  report  was  played  and/or  read  (a  standard  practice 
during  shift  changes  on  acute  care  wards  in  VHA  hospitals). 

•  The  participant  was  then  instructed  to  play  the  shift  change  update, 
which  could  be  played  multiple  times,  while  taking  notes. 

•  The  participant  was  provided  with  a  medication  cart,  featuring  an 
attached  laptop  computer  and  barcode  scanner  identical  to  that  used  on 
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the  patients  wards.  Barcoded  wristbands  and  empty  medieation 
paekages  with  appropriate  barcodes  were  provided  for  the  purpose  of 
scanning  and  simulating  the  administration  of  the  medication. 

•  The  participants  were  required  to  answer  the  telephone  and  provide 
simulated  responses  to  requests  from  testers  who  interrupted  their 
procedural  work. 

•  At  the  end  of  the  session,  a  debriefing  interview  was  conducted  to 
better  identify  and  understand  activities  that  occurred  during  the 
simulation,  and  a  short  usability  questionnaire'^  was  completed. 

WMA  example 

The  usability  testing  for  the  WMA  device  involved  a  total  of  five  subjects 
(with  one  used  to  pilot  test  the  scenarios  and  the  testing  process  as  it  was  tailored 
for  this  system),  each  participating  for  90-120  minutes.  A  program  patch  had 
been  installed  to  eliminate  unexpected  side  effects,  just  prior  to  the  time  the 
usability  testing  originally  was  to  have  been  conducted.  But  the  patch  altered  the 
usability  of  the  orders  that  should  have  appeared  on  the  IV  page  tab  (they  were 
missing),  which  forced  the  rescheduling  of  the  test  and  further  reinforced  the 
importance  of  a  pilot  run. 

Step  4 — Data  analysis 

Following  the  user  testing,  the  analysts  list  the  common  sources  of  interface 
difficulty  experienced  by  the  users.  The  videotape  of  the  testing  session  then  is 
reviewed,  the  patterns  of  use  are  counted,  and  notes  are  made  on  the  evidence  of 
adaptation  to  work  constraints  (e.g.,  deferring  tasks,  shedding  tasks,  decreasing 
performance,  etc.)  and  the  perceived  need  for  artifacts.  The  user  actions  and 
verbalizations  are  analyzed  for  confusion  and  difficulties  related  to  meeting  task 
goals,  and  time  spent  on  tasks.  The  created  list  of  interface  problems  then  is 
prioritized  on  the  basis  of  risk  to  patients  and  ease  of  improvement  (i.e.,  low 
hanging  fruit,  critical,  moderate,  and  long-term  change).  This  analysis  is  done  to 
advance  a  dialogue  with  the  designers  on  the  best  methods  for  managing  and 
allocating  available  patient  safety  resources.  The  success  of  a  usability  test  often 
is  measured  by  the  positive  change  that  occurs  in  the  interface  design  and  by  the 
specific  strategies  implemented  to  improve  the  interface  ease  of  use  (e.g., 
training). 

BCMA  example 

Findings  from  an  analysis  of  the  BCMA  scenario-based  usability  testing  data 
include  the  following: 

•  Practitioners  did  not  complete  tasks  when  automated  actions  occurred 
without  their  knowledge  (e.g.,  medication  orders  dropped  off  the 
BCMA  record  automatically  after  a  period  of  time,  whether  or  not  the 
medication  had  been  administered). 
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•  Data  that  appears  on  the  display  only  when  seleeted  may  be  ignored  or 
forgotten  (e.g.,  medieations  that  are  visible  only  when  eertain  filters 
are  on  may  be  missed  without  a  visual  eue  to  remind  the  user  that  the 
data  has  been  hidden). 

•  Nonroutine  aetivities  that  are  part  of  the  workflow  proeess  are  not 
effeetively  supported  by  the  system  interfaee  (e.g.,  users  are  required 
to  leave  the  BCMA  system  and  enter  another  to  “undo”  an  aetion). 

These  results  led  to  redesigns  in  the  software  so  that  (1)  medieations  set  to 
expire  will  not  be  removed  from  a  patient’s  reeords  without  a  nurse  first  being 
notified,  (2)  the  default  filter  on  the  list  of  pending  medieations  now  displays  all 
medieations  (i.e.,  one-time  and  interval  dosages),  and  (3)  a  provision  has  been 
ineluded  in  the  BCMA  graphieal  interfaee  that  allows  nurses  to  note  medieations 
withheld  or  refused  by  the  patient. 

WMA  example 

In  the  WMA  applieation,  the  usability  tests  identified  the  following 
shorteomings  in  the  interfaee  design: 

•  The  plaeement  of  the  virtual  keyboard  on  the  display  sereen  obseured 
the  nurse’s  view  of  erueial  patient  allergies  data  and  had  a  signifieant 
impaet  on  usability  and  safety. 

•  The  small  sereen  size  eliminated  key  information  from  the  display 
(e.g.,  patient  identifieation  information  was  displayed  only  after  the 
patient’s  reeords  had  been  loaded  from  the  database). 

•  Like  BCMA,  the  WMA  interfaee  should  support  report  generation 
(e.g.,  the  value  of  paper  printouts  used  to  supplement  PDA  interfaees 
should  be  eonsidered  sinee  they  support  effieiently  aoeessing  and 
interpreting  relationships  from  large  eolleetions  of  data). 

These  results  led  to  signifieant  improvements  in  the  software  interfaee:  (1) 
two  forms  of  positive  patient  identifieation — patient  name  and  soeial  seeurity 
number — now  are  displayed  at  all  times,  onee  the  patients  reeord  is  loaded  and 
eonfirmed;  (2)  the  virtual  keyboard  no  longer  eovers  the  lower  portion  of  the 
allergies  list;  and  (3)  more  interrelated  information  now  is  shown  in  parallel,  and 
the  proeess  for  retrieving  information  has  been  simplified. 


Discussion 

Seenario-driven  usability  tests  are  routinely  created  for  innovations  in 
software.  Usability  testing  in  the  software  industry  involves  a  user  performing  a 
series  of  often  unrelated  single  tasks  (e.g.,  open  a  program,  save  a  fide),  without 
performance  pressure.  Software  used  in  health  care  can  prevent  patient  injury  or 
contribute  to  it,  when  usability  testing  is  not  designed  to  mitigate  the  effects  of 
working  conditions  and  decisionmaking  complexity.  Such  testing  might  prevent 
accidents  and  provide  “reasonable”  safeguards  that  are  truly  effective  (allergy 
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notification  routinely  is  missed  if  not  plaeed  as  a  visible  warning  or  on  the 
multiple  screens)  and  push  design  toward  software  that  simplifies  work,  rather 
then  adding  new  tasks. 

The  scenario-based  usability  testing  we  eonducted  for  the  BCMA  and  WMA 
systems  in  the  Veterans  Health  Administration  identified  six  negative,  unintended 
side  effects  with  the  potential  to  create  new  paths  to  errors:  (1)  automated  removal 
of  medications  in  the  BCMA  system  eaused  confusion;  (2)  poorly  organized  data 
screens  resulted  in  missed  medications;  (3)  users  had  to  exit  one  system  and  log 
into  another  to  complete  documentation;  (4)  portions  of  the  data  display  screens 
were  blocked  by  the  virtual  keyboard;  (5)  key  information  from  the  BCMA 
system  was  not  replicated  on  the  WMA  system;  and  (6)  the  WMA  system  would 
not  support  report  printout  generation,  despite  the  nurses’  need  for  it. 

The  scenario-based  testing  results  revealed  gaps  between  the  conceptual 
model  of  the  system  and  that  of  work  practiee.  During  the  testing,  when  the  nurses 
discovered  medications  were  missing  from  the  interface,  confusion  ensued.  In 
most  of  the  tests,  however,  the  nurses  realized  that  the  medications  automatically 
removed  from  the  system  should  have  been  given  and  administered  them 
eventually.  The  nurses  said  this  type  of  design  deeision  ereated  a  new  potential 
path  to  missed  medications. 

The  transition  from  the  desktop  BCMA  interface  to  a  hand-held  WMA  deviee 
required  more  design  innovation  than  simply  shrinking  the  larger  laptop  computer 
screen.  Due  to  space  limitations,  the  importance  of  the  information  and  its 
organization  on  the  sereen  takes  on  a  new  dimension  as  design  trade-offs  are 
made.  And  while  the  virtual  keyboard  is  neeessary  for  operating  the  tool,  the 
information  that  it  hides  is  also  necessary  for  making  decisions  and  providing 
complete  care.  Given  that  missing  information  is  known  to  degrade  performance 
with  the  desktop  version  of  the  system,  the  visual  layout  of  the  hand-held  device 
and  the  methods  for  organizing  the  displayed  information  are  even  more  eritical. 
Identification  information,  for  example,  is  key  to  patient  safety  and  should  be 
visible  constantly. 

The  use  of  the  WMA  tool  does  not  occur  in  a  vacuum.  In  fact,  because  it  is  a 
mobile  technology,  it  will  be  used  in  situations  and  circumstances  where  the 
desktop  version  is  not  praotieal.  Similarly,  work  patterns  developed  through  the 
use  of  the  desktop  version  will  be  transferred  to  the  WMA  model.  These  patterns 
will  be  strengthened  through  the  unique  capabilities  of  the  hand-held  device.  The 
scenario-based  testing  revealed  that  the  work  patterns  did  not  change  with  the 
hand-held  device,  and  the  nurses  still  preferred  to  use  the  written  reports  that  the 
desktop  version  ean  produce.  Limitations  related  to  the  WMA  created  task 
complications,  as  users  found  themselves  using  the  laptop  computer  for  certain 
things  and  the  hand-held  unit  for  others. 
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Conclusion 

Clinical  information  systems,  by  definition,  are  used  to  display  volumes  of 
information  important  to  the  eare  of  patients.  Design  strategies  are  used  to 
organize  the  interfaee  in  sueh  a  way  that  the  most  important  information  is 
available  at  a  glanee,  without  overwhelming  the  user.  This  is  diffieult  to 
aoeomplish  when  there  is  a  diseonneet  between  the  aetual  work  praetiee  and  the 
system’s  design.  Seenario-based  testing  can  provide  results  that  help  designers  to 
organize  the  interfaee  in  ways  that  support  memory  and  assist  in  user  reeovery 
from  errors.  At  the  same  time,  however,  this  type  of  testing  is  not  designed  to 
determine  whether  or  not  the  proeess  itself  is  flawed. 

Currently,  our  ability  to  prediet  the  impaets  of  new  teehnologies,  prior  to  their 
introduetion,  is  limited  by  our  understanding  of  how  teehnologies  impact  work 
practice.  Seenario-based  usability  testing  results  will  feed  baek  into  a  researeh 
base,  providing  further  insights  into  how  the  dimensions  of  the  teehnology  impact 
the  work  it  exists  to  support.  Inereased  understanding  of  these  dimensions  will 
enable  us  to  make  design  ehanges  prior  to  implementation  that  will  improve  the 
teehnology ’s  usefulness  and  reduee  unintended  side  effeets  at  a  point  in  the  design 
proeess  when  ehanges  are  mueh  less  expensive  and  risky  to  make. 
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